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The soybean genome assembly has been available since the end of 2008. Significant features of the genome 
include large, gene-poor, repeat-dense pericentromeric regions, spanning roughly 57% of the genome se- 
quence; a relatively large genome size of -1.15 billion bases; remnants of a genome dupKcation that occurred 
~13 million years ago (My a); and fainter remnants of older polyploidies that occurred ~58Mya and 
>130Mya. The genome sequence has been used to identify the genetic basis for numerous traits, including 
disease resistance, nutritional characteristics, and developmental features. The genome sequence has provided 
a scaffold for placement of many genomic feature elements, both from within soybean and from related spe- 
cies. These may be accessed at several websites, including http://www.phytozome.net, http://soybase.org, 
http://comparative-legumes.org, and http://www.legumebase.brc.miyazaki-u.ac.jp. The taxonomic position of 
soybean in the Phaseoleae tribe of the legumes means that there are approximately two dozen other beans 
and relatives that have undergone independent domestication, and which may have traits that will be useful 
for transfer to soybean. Methods of translating information between species in the Phaseoleae range from de- 
sign of markers for marker assisted selection, to transformation with Agrobacterium or with other experi- 
mental transformation methods. 

Key Words: Glycine max, soybean, legume evolution, polyploidy, SoyBase, Legume Information System, 
Legumebase, Phytozome. 



Introduction 

The soybean genome sequence was assembled and made 
available in late 2008. Since then, the genome sequence has 
been used to identify numerous genes for traits of interest. 
The structure of the soybean genome has been complicated 
by an episode of polyploidy that was followed by genomic 
rearrangements, expansion of pericentromeric regions, and 
gene losses and duplications. Nevertheless, substantial con- 
servation remains between soybean and other cultivated 
bean relatives, and the genomic duplication makes possible 
some intriguing glimpses into the history of genome evolu- 
tion over the -13 million years since the occurrence of this 
polyploidy event. 

The primary structural features of the soybean ge- 
nome assembly 

The soybean genome sequence was assembled in late 2008 
from -8.5-fold whole-genome shotgun coverage that con- 



Communicated by J. Abe 

Received July 28, 201 1. Accepted September 19, 201 1. 
*Corresponding author (e-mail: steven.caimon@ars.usda.gov) 



sisted of paired-end Sanger reads from three BAG libraries, 
and fosmid libraries of several size classes (Schmutz et al. 
2010). Although this review won't attempt to repeat the con- 
tent of the report of the soybean genome sequence (Schmutz 
et al. 2010), several features from this report are worth high- 
lighting in this context. The assembly is estimated to be 
approximately 85% complete, with most of the missing se- 
quence believed to consist primarily of repetitive sequence 
in the pericentromeric regions. This means that nearly all 
genes are expected to be present in the genome sequence — 
either in the 20 chromosomes (or "pseudomolecules", in ref- 
erence to the fact that the assembled sequence is only an ap- 
proximation of the true chromosomal sequence), or in the 
remaining small assembly scaffolds that could not be confi- 
dently placed within the pseudomolecules. 

A first observation about the genome is its size. The soy- 
bean genome is moderately large in comparison to most 
other plant genomes that have been sequenced to date. At 
-1,150 million basepairs (Mbp), it is more than eight times 
the size of the Arabidopsis genome (125 Mbp), almost two 
and a half times the size of the genomes of the model legumes 
Medicago truncatula and Lotus japonicus or the grape ge- 
nome (each is -450^70 Mbp), roughly double the size of 
common bean and poplar (625 and 550 Mbp, respectively), 
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Fig. 1. Duplicated segments within the soybean genome. Colored blocks to the left of each chromosome show regions of correspondence with 
chromosomes of the same color. For example, the light blue blocks at the top of Gm09 correspond with regions on the light blue Gml5, and vice 
versa. These correspondences are remnants after the Glycine genome duplication. Locations of centromeric repeats are shown as black rectangles 
over the chromosomes. Regions lacking internal correspondences (generally near chromosome centers) mark the approximate locations of the 
gene-poor pericentromeres. This figure is derived from the CViT genome search and synteny viewer (Cannon and Cannon (submitted)) at the 
Legume Information System, http://comparative-legumes.org/. 



but less than half the size of maize (2,300 Mbp) (Bennett 
and Leitch 2010, Cannon et al. 2006, Tuskan et al. 2006, 
Wei et al. 2009). The number of predicted coding genes in 
soybean is also relatively high, at -46,400, vs. -26,500 in 
Arabldopsls, -30,400 in grape and -45,000 in poplar 
(Schmutz et al. 2010, Sterck et al. 2007). Both the relatively 
large genome size and high gene count are likely due to the 
recent polyploidy in soybean's history. 

A whole-genome duplication (WGD) is one of the most 
striking features of the soybean genome. Evidence of the 
WGD is apparent when the genome is compared with itself. 
The result is a mosaic of chromosomal regions that show in- 
ternal synteny, or runs of genes that are in the same orders 
and orientations in other parts of the genome (Figs. 1,2). The 
synteny blocks shown here sometimes extend to tens of mil- 
lions of bases (essentially, to the scale of chromosome arms, 
relative to the -50 million-base chromosomes). The blocks 



are, however, interrupted by small insertions, deletions, or 
inversions — testament to the many rearrangements that have 
occurred in the genome since the WGD. The WGD has been 
dated to between -5 and -13 Mya (Doyle and Egan 2009, 
Schmutz et al. 2010). Besides uncertainties due to choice of 
evolutionary rate terms to apply to measured divergences, 
there is also uncertainty about whether the event was auto- 
polyploidy (derived from a single species) or allopolyploidy 
(derived from different species). If the latter occurred (as 
suggested by the existence of two divergent sets of centro- 
meric repeats (Walling et al. 2006)), then it is possible that 
the species may have been separate for some millions of 
years prior to the genome fusion and resulting polyploidy. In 
any case, the extent of current divergence between the corre- 
sponding ("homoeologous") chromosomes can be measured. 
The similarity between coding sequence in paralogous genes 
in recently duplicated regions is indicated by a modal 
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Fig. 3. Genetic vs. physical distances for chromosome 10. Sequence- 
based genetic markers (cM units, vertical axis) have been compared 
with the soybean chromosomal genome assembly to determine their 
physical locations (100 kb units, horizontal axis). The pattern of steep 
slopes at the chromosome ends and flat slopes in the centers is com- 
mon across all 20 chromosomes, and corresponds with high rates of 
recombination in the gene-rich euchromatic chromosomal ends and 
suppressed recombination in the repeat-rich, gene-poor chromosomal 
centers. 



Fig. 2. Close comparison of two soybean chromosomes. The soybean 
chromosome 1 0 assembly (Gm 1 0, horizontal) and chromosome 20 as- 
sembly (Gm20, vertical) are shown. Each dot represents homology of 
predicted coding sequences in the two chromosomes. Faint dotted 
lines show the boundaries of smaller sequence assemblies that were 
ordered to produce the chromosome-scale assemblies. Diagonal fea- 
tures in the upper right quadrant indicate corresponding regions be- 
tween these two chromosomes. A large inversion is indicated by a line 
of homology dots that slopes down and to the right. The interrupted di- 
agonal toward the center has been disrupted by transposon insertions 
in pericentromeric regions in both chromosomes. The pericentromeric 
regions are also marked by higher densities of dots (homologies) in 
roughtly the lower-left two thirds of the space, primarily caused by 
retrotransposon sequences. 

percent nucleotide identity of 93-94%. Outside of coding 
regions, sequences have generally changed too extensively 
to allow alignments (Cannon, unpublished information). A 
practical consequence of the high similarity in coding se- 
quence among paralogs is that sequence-homology-based 
methods such as RNAi, PCR and DNA hybridization may 
affect both WGD-derived paralogs. 

Besides the Glycine WGD, the soybean genome has also 
been strongly shaped by at least two previous rounds of 
genome duplications: one at around 58 million years ago, 
near the origin of the papilionoid legume subfamily; and a 
genomic triplication that occurred before the radiation of the 
Rosid or Fabid clade, before 130 million years ago. All to- 
gether, these polyploidies have resulted in up to 12 homoeol- 
ogous genomic copies of any given genomic region. Typi- 
cally, a genomic region will be closely related to one other 
region (via the recent duplication); more distantly related to 
two other regions (via the early legume duplication plus the 
Glycine WGD); and showing faint similarity to up to eight 
other regions (via the pre-Fabid triploidy, the legume dupli- 



cation and the Glycine WGD). While paralogous genes 
from the Glycine WGD typically have -93-94% identity, 
paralogs from the early legume WGD typically have ~75- 
79% identity. A consequence of soybean's duplication histo- 
ry, most genes exist at least in duplicate, even for small gene 
families. Only the paralogs from the Glycine duplication 
tend to be similar enough to cause complications during 
standard lab procedures, but similar gene functions may 
have been retained across the older paralogous duplications. 
This means that gene discovery through knockout may be 
more difficult in soybean than in some less-duplicated plant 
genomes, and may mean that there are more loci and QTLs 
to follow for some soybean traits. Similarly, assuming no 
gene losses or additional duplications, a gene whose func- 
tion has been identified in Arabidopsis may have four equi- 
distant paralogs in soybean, and eight somewhat more dis- 
tant paralogs via the Fabid triplication. 

Another prominent feature in the soybean genome is the 
large, distinct pericentromeres in all of the chromosomes. 
These comprise approximately 57% of the current assembly 
(Schmutz et al. 2010). They are repeat-dense and gene-poor, 
and have extremely suppressed rates of recombination. 
Suppressed recombination is evident in the plot of genetic 
distance vs. physical (sequence) distance for chromosome 
10 (Fig. 3). The long, nearly horizontal run of dots repre- 
sents approximately 45 genetic markers with virtually the 
same cM position, but spanning 55% of this ~51 Mbp chro- 
mosome. However, although the pericentromeric regions are 
gene-poor relative to the euchromatic chromosome arms, the 
pericentromeres do contain a large number of genes in total: 
more than 2 1% of the predicted high-confidence genes come 
from the pericentromeres (Schmutz et al. 2010). An implica- 
tion of this finding is that ~l/5th of the gene complement oc- 
curs in regions of the genome that only rarely recombine. 
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This has consequences for QTL mapping of traits in this re- 
gion, and for attempts to break hnkages between desirable 
and vmdesirable traits in the pericentromeres. One such ex- 
ample is the soybean seed protein QTL on linkage group I 
(LG I / Gm20). This QTL, flanked by two non-segregating 
markers, nevertheless spans -8.4 Mbp in Gm20 because it is 
located within a pericentromere (Bolon et al. 2010). 

While the current genome assembly and gene annotations 
compare favorably with all other high-quality whole- 
genome shotgun-sequenced plant genomes (see Supplemen- 
tal Table 4 for a comparison of genomes Schmutz et al. 
2010), both the assembly and gene annotations do contain 
known errors. The genome assembly includes -377 identi- 
fied physical gaps in the assembly, some regions of probable 
misassembly exist inpericentromeric regions, and gene mod- 
els can often be improved by addition of new and higher- 
quality data. A revision of the gene models is anticipated in 
early 2012 (Jeremy Schmutz, pers. comm.), and the genome 
assembly itself will undergo a revision, on the basis of new 
marker and other data, later in 2012 or 2013 (Perry Cregan 
and Jeremy Schmutz, pers. comm.). 

Applications of tlie soybean genome in gene identifica- 
tion and crop improvement 

The availability of the soybean genome sequence has quick- 
ly enabled the identification of genes for numerous impor- 
tant traits. Several prominent examples include identifica- 
tion of genes that affect the following traits: resistance to 
Asian Soybean Rust (Meyer et al. 2009); the seed antinutri- 
tional components stachyose and raffinose (Skoneczka et al. 
2009); seed oil quality via fatty acid dehydrogenases (Pham 
et al. 2011); seed taste and rancidity via lipoxygenase en- 
zymes (Lenis et al. 2010); the seed antinutritional compound 
phytate (Saghai Maroof et al. 2009); the basis for plant de- 
terminacy (Tian et al. 2010); and resistance to soybean mo- 
saic virus (Wen et al. 201 1). 

Most of these cases have progressed first from QTL stud- 
ies, and the genomic sequence has enabled rapid selection of 
new markers to narrow the QTL region, and then identifica- 
tion of candidate genes for testing via gene complementation 
tests. The process of selecting candidate genes fi-om a region 
is often aided by gene annotations that have been determined 
by homology to genes in other plants such as Arabidopsis. 
The genome sequence also makes possible the design of nu- 
merous genetic markers in a region of interest, and scoring 
of those markers in a population that includes lines with and 
without the trait of interest. This haplotype 'association' ap- 
proach, applied on a large scale, may make it possible to re- 
duce the sizes of QTL regions for a broad range of traits. 
New genomic tools may, however, be used with increasing 
fi-equency. Some of these are described below. 

Some computational resources for soybean research 

The soybean genome sequence has provided a common ref- 



erence Irame for genomic features (genes, regulatory ele- 
ments, transposons, other repeat sequences, markers, etc.) 
from both soybean and fi-om related species. This has en- 
abled development of several capable genome browsers, 
each with different specializations and capabilities (Table 1). 
Some strengths of the Phytozome soybean browser (http:// 
www.phytozome.net) are mappings of datasets that support 
gene models. Views are limited to 500 kbp, but usefiil fea- 
tures include ahgnments of plant peptides, soybean ESTs, 
and VISTA (conservation) plots from other plant species. 
Some strengths of the SoyBase genome browser and the 
Soybean Breeder's Toolbox (http://soybase.org) are the inte- 
gration of genetic map, trait, and genome sequence data, the 
ability to search and view at a scale of the whole genome or 
whole chromosomes, views of RNA-seq transcriptome ex- 
pression patterns fi-om many tissues and views of the soy- 
bean genome compared with itself and with the other model 
legume genomes. Some strengths of the Legume Infor- 
mation System (http://comparative-legumes.org) are the 
capacity to do multi-gene searches against multiple target 
databases, and the integration of the genomes of three ref- 
erence legumes through reciprocal synteny plots between 
these genomes. Some strengths of 'Legumebase' (http:// 
www.legimiebase.brc.miyazaki-u.ac.jp) include catalogs of 
resources for soybean breeding, including recombinant in- 
bred lines, and wild accessions and cultivars. Numerous 
other computational and community resources for soybean 
are Hsted at http://soybase.org/. 

Genomic relatives of soybean with potential for soy- 
bean improvement 

Soybean is in the Phaseoleae tribe, which contains a remark- 
ably large number of other plants that are used as food crops 
(Fig. 4). This is worth noting in a review of the soybean ge- 
nome both because of the many traits that have been under 
independent selection across the numerous cultivated spe- 
cies in this group and because of the relative similarity and 
stability of genomes in the Phaseoleae. Both factors suggest 
that knowledge gained about the molecular basis of traits in 
any of these species is likely to transfer well to other species 
in the group. As an example of this sort of knowledge trans- 
fer, the genes for determinacy in soybean were identified by 
homology to the Dtl gene in common bean, which was in 
turn identified as a candidate for dwarfing via its homology 
to the Tfll (terminal flower 1) gene m Arabidopsis thaliana 
(Kwak et al. 2008, Tian et al. 2010). 

Before examining the cultivated species in the Phaseoleae, 
some taxonomic background may be helpful. Plants in the 
Phaseoleae are often informally referred to as the "warm- 
season" legumes, to contrast them with the "cool-season" 
legumes such as pea, medics, clovers and vetches. These two 
clades occur, respectively, in the phaseoloid/millettioid 
clade and the Hologalegina clade. These two clades are sep- 
arated by a substantial evolutionary distance of ca. 54 Mya 
(Lavin et al. 2005). In contrast, all domesticated species in 
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Table 1. Some on-line resources for soybean research. More soybean-specific resources are listed first, and broader plant- or clade databases or 
resources are listed below 

SoyBase http://soybase.org 

Trait (QTL) and marker data; transposon database; metabolic pathways; genome browser with 
expression and comparative data; full chromosome-scale browser views, and synteny data 
with comparisons to soy duplications and other legume genomes. 

Legumebase http://www.legumebase.brc.miyazaki-u.ac.jp 

Extensive information about legume lines (cultivated, plant introduction, wild, mutants) 
access to seed stocks; clones for full-length cDNAs; RIL populations 

Soybean knowledge base, soykb http://soykb.org/ 

Soybean microarray, transcrioptomic, proteomic, pathway, phenotype data. 

Soybean Functional Genomics Database, SFGD http://bioinformatics.cau.edu.cn/SFGD/ 

Soybean gene coexpression networks 

SoyDB http://casp.met.missouri.edu/soydb/ 

Transcription factors for soybean, including predicted structural characteristics, protein family 
characteristics. 

Phytozome http://phytozome.net 

Bulk datasets; genome browser with close views (up to 500 kbp); numerous gene-related 

browser tracks; plant gene families 
Legume Information System, LIS http://comparative-legumes.org 

Multi-sequence queries against various legume databases; genome browsers for Medicago and 

Lotus integrated with SoyBase soybean browser; comparative and synteny data; legume gene 

families; whole-genome views of sjTiteny and multi-sequence queries. 
Legume Integrative Platform, LegumelP http://plantgm.noble.org/LegumeIP/ 

Expression data (from microarrays and short-read sequences) from soybean and other 

legumes. Search, synteny comparisons, gene families. 
Plant Genome Duplication Database, PGDD http://chibba.agtec.uga.edu/duplication/ 

Display of corresponding regions between soybean genes and regions and other plant genomes 



the Phaseoleae shared a common ancestor at approximately 
19Mya. The majority of domesticated beans (those within 
subtribe Phaseolineae, Fig. 4) shared a common ancestor 
within approximately 1 1 Mya (Lavin et al. 2005), while 
Glycine is in a clade that separated from the Phaseolineae at 
around 19 Mya. 

Besides the relative recent divergence of the Phaseoleae, 
most of the species in the tribe for which chromosome num- 
bers have been determined have a chromosome count of 
IN = 1 1, suggesting substantial conservation of genome struc- 
ture. In contrast, chromosome counts in the Hologalegina 
vary more widely (with IN = 7 and 8 most common), sug- 
gesting more frequent genomic rearrangements. An excep- 
tion to genomic conservation across most of the Phaseoleae 
is, perhaps unfortunately, soybean. Glycine max (and most 
other species in the genus), with IN = 20 chromosomes, has 
experienced both a genome duplication and subsequent re- 
arrangements. Nevertheless, significant conservation does ex- 
ist between Glycine and other Phaseoleae species that have 
been used in comparisons — chiefly, Vigna and Phaseolus. 
Soybean shows extensive synteny with cowpea — for exam- 
ple, with the whole of cowpea chromosome 5 being syntenic 
with soybean chromosome 14 (Gml4) and with homoeolo- 
gous segments on Gm02 and Gml7 (Muchero et al. 2009). 
Similarly, common bean shows extensive synteny with soy- 
bean (McClean et al. 2010). The synteny between soybean 
and both bean and cowpea tends to be in large chunks, rang- 
ing from perhaps a tenth of a chromosome to nearly a fijll 



chromosome; and in each case, the phaseoloid chromosome 
regions each match two soybean regions, because of the du- 
plication in Glycine. This is apparent in Fig. 5, which shows 
correspondences between soybean chromosomes Gm06 and 
Gm04 with Phaseolus linkage group PvOl . 

While discussing soybean relatives and their potential for 
soybean improvement, we would be remiss not to mention 
the perennial relatives of soybean: those Glycine sp. in the 
subgenus Glycine. These include approximately 28 species, 
not all formally recognized. Although none of these species 
apart from G. max have been domesticated, many of them 
possess traits of potential utility for soybean improvement. A 
few of those traits include resistance, in Glycine tomentella, 
to soybean cyst nematode (Campbell et al. 2000), resistance 
in various perennial Glycine species to soybean fimgal 
pathogens (Hartman et al. 2000), resistance various perenni- 
al Glycine species to bean pod mottle virus (Zheng et al. 
2005), and tolerance in three Glycine species to salt stress 
(Kao et al. 2006). Although embryo rescue has allowed some 
crosses to be made with soybean, reproductive barriers will 
prevent easy gene flow into the primary soybean gene pool. 

Both the combination of genomic conservation across the 
Phaseoleae, and the independent domestication process in 
the constituent species, bode well for identifying corre- 
sponding loci and ttaits of value across this tribe. As will be 
described in more detail below, various species in the tribe 
harbor traits that may be of value in the ongoing breeding 
efforts in soybean, including drought and flooding tolerance. 
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Fig. 4. Phylogeny of soybean and some related species. Genera that 
include soybean and other domesticated bean species are shown, along 
with other selected model legume species. Estimated coalescence 
times (times to common ancestral nodes) are inferred from phyloge- 
nies and datings in Lavin et al. (2005) and Stefanovic and Doyle 
(2009). 

resistance to various pathogens, and nutritional and growth 
characteristics. 

The majority of agronomic species in the Phaseoleae fall 
within the Phaseolinae sub-tribe, in the genera Vigna, 
Phaseolus, Dolichos, Canavalia and Macrotyloma. In 
Phaseolus, cultivated species include P. vulgaris (common 
bean, green bean, shelling bean, popping bean, dry bean), 
P. coccineus (scarlet runner bean), P. lunatus (lima bean), 
P. umbellata (rice bean), and P. acutifolius (tepary bean). 
In Vigna, cultivated species include V. angularis (adzuki 
bean), V. aconitifolia (moth bean), V. mungo (urad or 
black dal); V. radiata (mung bean or green gram) and 
V. subterranea (Bambara groundnut). Other genera in 
Phaseolineae that contain food legumes include Dolichos 
lablab (hyacinth bean, common in South and Southeast 
Asia); Canavalia sp. (jack-bean and sword-bean) and 
Macrotyloma geocarpum (Hausa or Kersting's groundnut). 
Food legumes outside the Phaseolineae group but within 
Phaseoleae include Cajanus cajan (pigeonpea); Pachyrhizus 
spp. (including P. erosus, orjicama;/". tuberosus, or Andean 
yam bean and P. tuberosus, or Amazonian yam bean); 
Psoralea esculenta ("prairie turnip", used for its edible 
tuberous taproot by native American Indians in the western 
Great Plains of the United States); Amphicarpeae bracteata 
("hog peanut", occasionally used for its edible seeds — 




Fig. 5. Comparison of two soybean chromosomes with a Phaseolus 
linkage group. Sequence-based markers in Phaseolus vulgaris linkage 
group PvOl (center) is compared with soybean chromosomes Gm06 
(left) and Gm04 (right). The comparisons are modified from com- 
parative map displays at the Legume Information System (http:// 
comparative-legumes.org). The Phaseolus map is the 2009 map of 
Conserved Orthologous Sequences from Doug Cook (Choi et al. 2004, 
2006). 



which are buried by the plant underground, similar to pea- 
nuts); Phosphocarpus tetragonolobus (winged bean, used 
for its edible seeds, pods, tubers and leaves in south-east 
Asia); Mucuna pruriens (velvetbean, used medicinally) and 
Apios americana (historically used as a staple food for its ed- 
ible tubers by American Indians in the eastern United States). 

Prospects for translating information between species 
in tlie Phaseoleae 

With the soybean genome sequence essentially complete 
and genome sequences well underway (at the time of writ- 
ing) for common bean, cowpea and pigeonpea, it should be 
possible to precisely identify most corresponding loci across 
these species. 

There will be, however, some predictable barriers to 
comparisons between the genomes of soybean and species in 
other phaseoloid clades. The first difference is that the peri- 
centromeric regions of soybean have evidently expanded 
dramatically within approximately the last 10 million years. 
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Expansion of the pericentromeres is evident in comparisons 
of the soybean genome to itself In pericentromeric regions, 
many genes have been lost in one or the other homoeologous 
regions, and existing genes have been moved apart by inser- 
tion of transposons. This can be seen in Fig. 2, in which syn- 
teny remaining from the ~13 My a Glycine genome duplica- 
tion is apparent as essentially unbroken lines of collinear 
genes in the corresponding euchromatic chromosome ends 
of GmlO and Gm20, but the synteny in the corresponding 
chromosomal centers have been disrupted by transposon in- 
sertions in the two respective chromosomes. In a case de- 
scribed by limes et al. (2008), a one-megabase region in a 
euchromatic portion of Gml3 corresponds with a hetero- 
chromatic portion of Gml5, which had expanded more than 
four-fold relative to Gml3 (through transposon insertions), 
and lost numerous genes. 

Near the ends of synteny blocks, corresponding genomic 
contexts may also be difficult to discern. And there will be 
cases of transpositions or other unexpected rearrangements. 
An example is in a disease resistance gene in Phaseolus that 
appears to have transposed into another genomic context in 
Phaseolus relative to the location of the orthologous gene 
cluster in soybean (David et al. 2009). The cause of the 
fransposition may be a satellite repeat, present in Phaseolus 
and not soybean, which is present near the ends of most 
Phaseolus chromosomes and appears to mediate higher 
rates of transposition or rearrangements near the ends of 
Phaseolus chromosomes (David et al. 2009). 

Despite the loss of synteny in some regions between soy- 
bean and other phaseoloid genomes, the similarity between 
soybean and other species in this clade is high enough that 
orthologs should be readily identifiable, regardless of ge- 
nomic context. The median and modal percent identities are 
approximately 89% for alignments of pubhshed bean and 
pigeonpea EST contigs and soybean genomic sequence 
(Cannon, unpublished data). 

Approaches for making use of genetic information 
across species boundaries 

Given information that a gene modifies some trait of interest 
in, say, common bean, how might this information be used 
for improvement of soybean? A straightforward approach 
would be in design of markers for that gene — either tightly 
linked, or "perfect" (i.e. capable of directly identifying the 
desired allele from a population). Perfect soybean markers 
exist, for example, for traits such as low phytic acid and low 
raffinose/stachyose (Skoneczka et al. 2009) and determina- 
cy (Tian et al. 2010). For traits that require new genes or al- 
leles not present in soybean germplasm, transformation is 
required. A striking recent example is the addition of the 
Arabidopsis QQS gene (Li et al. 2009), with a role in regu- 
lation of starch deposition, into soybean, resulting in in- 
creases of soybean seed protein by 30 to 60%. Conventional 
Agrobacterium-mediated transformation remains a relative- 
ly slow and costly way of inserting genes. New approaches 



such as targeted mutagenesis with zinc-finger nucleases 
(ZFNs) or TAL effectors may provide more flexible, effi- 
cient methods for direct genome modification (Wood et al.). 
ZFNs have been used in maize, Arabidopsis, and soybean 
(Curtin et al. 201 1, Shukla et al. 2009, Zhang et al. 2010), 
but currently rely on Agrobacterium for stable transforma- 
tion. So, while these methods provide for precise modifica- 
tion, a bottleneck remains in establishing stable transforma- 
tions. Regardless of the method of genome improvement — 
whether marker assisted selection or Agrobacterium trans- 
formation or experimental methods, the availability of the 
soybean genome sequence is itself a powerful tool for genet- 
ic improvement in soybean. 
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